[SPARK-33902][SQL] Support CREATE TABLE LIKE for V2#54809
[SPARK-33902][SQL] Support CREATE TABLE LIKE for V2#54809viirya wants to merge 7 commits intoapache:masterfrom
Conversation
638846e to
6e695fe
Compare
## What changes were proposed in this pull request? Previously, `CREATE TABLE LIKE` was implemented only via `CreateTableLikeCommand`, which bypassed the V2 catalog pipeline entirely. This meant: - 3-part names (catalog.namespace.table) caused a parse error - 2-part names targeting a V2 catalog caused `NoSuchDatabaseException` This PR adds a V2 execution path for `CREATE TABLE LIKE`: - Grammar: change `tableIdentifier` (2-part max) to `identifierReference` (N-part) for both target and source, consistent with all other DDL commands - Parser: emit `CreateTableLike` (new V2 logical plan) instead of `CreateTableLikeCommand` directly - `ResolveCatalogs`: resolve the target `UnresolvedIdentifier` to `ResolvedIdentifier` - `ResolveSessionCatalog`: route back to `CreateTableLikeCommand` when both target and source are V1 tables/views in the session catalog (V1->V1 path) - `DataSourceV2Strategy`: convert `CreateTableLike` to new `CreateTableLikeExec` - `CreateTableLikeExec`: physical exec that copies schema and partitioning from the resolved source `Table` and calls `TableCatalog.createTable()` ## How was this patch tested? - `CreateTableLikeSuite`: new integration tests covering V2 target with V1/V2 source, cross-catalog, views as source, IF NOT EXISTS, property behavior, and V1 fallback regression - `DDLParserSuite`: updated existing `create table like` test to match the new `CreateTableLike` plan shape; added 3-part name parsing test Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add two tests covering the case where the source is a V2 table in a non-session catalog and the target resolves to the session catalog. These exercise the CreateTableLikeExec → V2SessionCatalog path and confirm that schema and partitioning are correctly propagated. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add two tests to CreateTableLikeSuite documenting that pure V2 catalogs (e.g. InMemoryCatalog) accept any provider string without validation, while V2SessionCatalog rejects non-existent providers by delegating to DataSource.lookupDataSource. This is consistent with how CreateTableExec handles the USING clause for other V2 DDL commands. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…CREATE TABLE LIKE Two new tests covering previously untested code paths in CreateTableLikeExec: - Source provider is copied to V2 target as PROP_PROVIDER when no USING override is given, consistent with how CreateTableExec handles other V2 DDL. - CHAR(n)/VARCHAR(n) types declared on a V1 source are preserved in the V2 target via CharVarcharUtils.getRawSchema, not collapsed to StringType. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add inline comment explaining the six reasons withConstraints is intentionally omitted: V1 behavior parity, ForeignKey cross-catalog dangling references, constraint name collision risk, validation status semantics on empty tables, NOT NULL already captured in nullability, and PostgreSQL precedent (INCLUDING CONSTRAINTS opt-in). Also notes the path forward if constraint copying is added in the future. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Clarify that V1 tables (CatalogTable) have no constraint objects at all since CHECK/PRIMARY KEY/UNIQUE/FOREIGN KEY are V2-only concepts added in Spark 4.1.0, rather than saying CreateTableLikeCommand "never copied" them which implies an intentional decision rather than absence of the feature. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ed identifiers After the CREATE TABLE LIKE V2 change, the target and source identifiers in CreateTableLikeCommand are now fully qualified (spark_catalog.default.*) because ResolvedV1Identifier explicitly adds the catalog component via ident.asTableIdentifier.copy(catalog = Some(catalog.name)), and ResolvedV1TableIdentifier returns t.catalogTable.identifier which also includes the catalog. Update the analyzer golden file accordingly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
6e695fe to
6e3053c
Compare
|
I'll take a look later today. |
| // For CREATE TABLE LIKE, use the v1 command if both the target and source are in the session | ||
| // catalog (or a V1-compatible catalog extension). If source is in a different catalog, fall | ||
| // through to the V2 execution path (CreateTableLikeExec via DataSourceV2Strategy). | ||
| case CreateTableLike( |
There was a problem hiding this comment.
What does this mean for DSv2 connectors that override the session catalog?
There was a problem hiding this comment.
yea, agree, we should add a test for sessionCatalog
There was a problem hiding this comment.
Good catch. When a connector like Iceberg overrides the session catalog, the target resolves through ResolvedV1Identifier (since supportsV1Command returns true for the session catalog), but the source — a native Iceberg Table — does NOT match ResolvedV1TableOrViewIdentifier (which requires V1Table). So ResolveSessionCatalog falls through and CreateTableLikeExec handles it, passing the Iceberg Table directly. The target createTable call goes to V2SessionCatalog, which delegates to the Iceberg catalog extension. This should work, but deserves a test. I can add one if you'd like.
| // CHAR/VARCHAR types are preserved as declared (without internal metadata expansion). | ||
| val columns = sourceTable match { | ||
| case v1: V1Table => | ||
| val rawSchema = CharVarcharUtils.getRawSchema(v1.catalogTable.schema) |
There was a problem hiding this comment.
Do we have tests for this?
There was a problem hiding this comment.
Yes — the test "CHAR and VARCHAR types are preserved from v1 source to v2 target" in CreateTableLikeSuite covers this. It creates a V1 source with CHAR(10) and VARCHAR(20), runs CREATE TABLE testcat.dst LIKE src, and asserts schema("name").dataType === CharType(10) and schema("tag").dataType === VarcharType(20).
| case class CreateTableLikeExec( | ||
| targetCatalog: TableCatalog, | ||
| targetIdent: Identifier, | ||
| sourceTable: Table, |
There was a problem hiding this comment.
Does this mean it would only work for creating V2 table from another V2 table?
There was a problem hiding this comment.
Oh, this can be V1Table that wraps CatalogTable?
There was a problem hiding this comment.
Correct on both. sourceTable: Table is the V2 Table interface, which can be any implementation. For session catalog sources, ResolveRelations wraps the CatalogTable in a V1Table, which implements Table. So V1→V2 works: the source is a V1Table and we handle it explicitly in the match block at line 57 to preserve CHAR/VARCHAR types.
| val partitioning = sourceTable.partitioning | ||
|
|
||
| // 3. Resolve provider: USING clause overrides, else copy from source. | ||
| val resolvedProvider = provider.orElse { |
There was a problem hiding this comment.
Isn't this source provider but not target? Can we actually populate this?
There was a problem hiding this comment.
What does DSv1 do and is it applicable?
There was a problem hiding this comment.
Yes, this is the source provider being copied to the target — which is exactly the semantics of CREATE TABLE LIKE: the target inherits the source's format unless overridden by a USING clause. This matches V1 CreateTableLikeCommand behavior, which also copies the source provider. The copied provider goes into PROP_PROVIDER in finalProps and is passed to catalog.createTable. Whether the target catalog uses it is catalog-specific: InMemoryCatalog stores it as-is; V2SessionCatalog validates it via DataSource.lookupDataSource.
| locationProp | ||
|
|
||
| try { | ||
| // Constraints from the source table are intentionally NOT copied for several reasons: |
There was a problem hiding this comment.
This comment is too long to be included here, let's shorten it?
|
@gengliangwang @cloud-fan, can you folks help review as well? |
|
cc @szehon-ho as well |
| // If constraint copying is desired, use ALTER TABLE ADD CONSTRAINT after creation. | ||
| // If we wanted to support them in the future, the right approach would be to add an | ||
| // INCLUDING CONSTRAINTS clause (as PostgreSQL does) rather than copying blindly. | ||
| val tableInfo = new TableInfo.Builder() |
There was a problem hiding this comment.
Good point. CatalogV2Util.convertTableProperties (used by CreateTableExec) calls withDefaultOwnership to add the current user as owner. We should do the same by adding CatalogV2Util.withDefaultOwnership(finalProps). I'll add that.
| * - Source table's TBLPROPERTIES (user-specified `properties` are used instead) | ||
| * - Statistics, owner, create time | ||
| */ | ||
| case class CreateTableLikeExec( |
There was a problem hiding this comment.
Do we have V1 -> V2 within as well across catalog tests?
There was a problem hiding this comment.
Yes — "v2 target, v1 source: schema and partitioning are copied" tests V1 source (default.src in session catalog) → V2 target (testcat.dst). The "cross-catalog" and "3-part name" tests cover V2→V2 across catalogs.
|
The proposed behavior seems different from I wonder if we can delegate what to copy on |
| ResolvedTable(_, _, table, _), | ||
| fileFormat: CatalogStorageFormat, provider, properties, ifNotExists) => | ||
| CreateTableLikeExec( | ||
| catalog.asTableCatalog, ident, table, fileFormat, provider, properties, ifNotExists) :: Nil |
There was a problem hiding this comment.
The three CreateTableLike match cases (for ResolvedTable, ResolvedPersistentView, ResolvedTempView) are nearly identical. Consider consolidating into a single pattern:
case CreateTableLike(
ResolvedIdentifier(catalog, ident), source,
fileFormat: CatalogStorageFormat, provider, properties, ifNotExists) =>
val table = source match {
case ResolvedTable(_, _, t, _) => t
case ResolvedPersistentView(_, _, meta) => V1Table(meta)
case ResolvedTempView(_, meta) => V1Table(meta)
}
CreateTableLikeExec(
catalog.asTableCatalog, ident, table, fileFormat, provider, properties, ifNotExists) :: NilThere was a problem hiding this comment.
Good suggestion. I can refactor to the single pattern you proposed, with the source table resolved in an inner match. This is cleaner and removes the duplication. I wonder if we have precedents of DDL which consolidate both V1 and V2 commands?
| targetCatalog: TableCatalog, | ||
| targetIdent: Identifier, | ||
| sourceTable: Table, | ||
| fileFormat: CatalogStorageFormat, |
There was a problem hiding this comment.
fileFormat: CatalogStorageFormat carries inputFormat/outputFormat/serde fields, but only locationUri is used (line 84). Consider narrowing the exec's parameter to location: Option[URI] to make the contract explicit, leaving the full CatalogStorageFormat only in the logical plan (where the V1 fallback path needs it).
There was a problem hiding this comment.
Valid. Only locationUri is used in CreateTableLikeExec. I'll change the exec's parameter to location: Option[URI] and extract it at the DataSourceV2Strategy callsite.
| val v1 = "CREATE TABLE table1 LIKE table2" | ||
| // Helper to extract fields from the new CreateTableLike unresolved plan. | ||
| // The parser now emits CreateTableLike (v2 logical plan) instead of | ||
| // CreateTableLikeCommand, so both name and source are unresolved identifiers. |
There was a problem hiding this comment.
The source is UnresolvedTableOrView, not an unresolved identifier:
| // CreateTableLikeCommand, so both name and source are unresolved identifiers. | |
| // CreateTableLikeCommand, so the name is an UnresolvedIdentifier and the source is an UnresolvedTableOrView. |
There was a problem hiding this comment.
Good catch, I'll apply the suggestion.
|
BTW, consider unifying to a single CreateTableLikeExec — The current PR keeps two execution paths: V1 fallback via CreateTableLikeCommand (for V1-V1 cases) and the new CreateTableLikeExec (for V2 targets). The test "v2 source, v1 target" already proves CreateTableLikeExec works for session catalog targets via V2SessionCatalog. |
| * @param properties User-specified TBLPROPERTIES. | ||
| * @param ifNotExists IF NOT EXISTS flag. | ||
| */ | ||
| case class CreateTableLike( |
There was a problem hiding this comment.
can we have one single command (UnaryRunnableCommand)? I thought that's the preferred way now to reduce plan complexity in the different stages
What changes were proposed in this pull request?
Previously,
CREATE TABLE LIKEwas implemented only viaCreateTableLikeCommand, which bypassed the V2 catalog pipeline entirely. This meant:NoSuchDatabaseExceptionThis PR adds a V2 execution path for
CREATE TABLE LIKE:tableIdentifier(2-part max) toidentifierReference(N-part) for both target and source, consistent with all other DDL commandsCreateTableLike(new V2 logical plan) instead ofCreateTableLikeCommanddirectlyResolveCatalogs: resolve the targetUnresolvedIdentifiertoResolvedIdentifierResolveSessionCatalog: route back toCreateTableLikeCommandwhen both target and source are V1 tables/views in the session catalog (V1->V1 path)DataSourceV2Strategy: convertCreateTableLiketo newCreateTableLikeExecCreateTableLikeExec: physical exec that copies schema and partitioning from the resolved sourceTableand callsTableCatalog.createTable()Why are the changes needed?
CREATE TABLE LIKEwas implemented solely viaCreateTableLikeCommand, a V1-only command that bypasses the DataSource V2 analysis pipeline entirely. As a result, it was impossible to useCREATE TABLE LIKEto create a table in a non-session V2 catalog (e.g., testcat.dst): a 2-part name like testcat.dst was misinterpreted as database testcat in the session catalog and threwNoSuchDatabaseException, while a 3-part name like testcat.ns.dst was a parse error because the grammar only accepted 2-part tableIdentifier.This change routes
CREATE TABLE LIKEthrough the standard V2 DDL pipeline so that V2 catalog targets are fully supported, while preserving the existing V1 behavior when both target and source resolve to the session catalog.Does this PR introduce any user-facing change?
Yes.
CREATE TABLE LIKEDDL command supports V2.How was this patch tested?
CreateTableLikeSuite: new integration tests covering V2 target with V1/V2 source, cross-catalog, views as source, IF NOT EXISTS, property behavior, and V1 fallback regression, etc.DDLParserSuite: updated existingcreate table liketest to match the newCreateTableLikeplan shape; added 3-part name parsing testWas this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Sonnet 4.6